DeepMind Trains AI Agents Capable of Robust Real-time Cultural Transmission Without Human Data

Learning takes many forms in the human context, with environmental and social interactions high on the list. Much of what we regard as human intelligence derives from our ability to learn efficiently from other humans; this acquired knowledge can be categorized as culture and its transference between individuals as cultural transmission.

In the new paper Learning Robust Real-Time Cultural Transmission Without Human Data, a DeepMind Cultural General Intelligence team proposes a procedure for training AI agents capable of flexible, high-recall, robust real-time cultural transmission from human co-players in a rich 3D physical simulation without using human data in the training pipeline.

The team’s proposed AI agents are parameterized via a neural network, and deep reinforcement learning (RL) is used to train their weights. A novel aspect of this work is the application of agent-environment co-adaptation to create agents capable of robust real-time cultural transmission.

The team summarizes their work’s main contributions as:

We present a trained neural network solving the full cultural transmission problem, not only inferring information about the task from an expert as in prior work, but also remembering that information within an episode after the expert has dropped out, and leveraging the information to solve hard exploration problems.
We demonstrate that our cultural transmission agent can generalize in few-shot to a wider space of held-out tasks than previously considered, varying the positions of objects in the world, the structure of the game, and the behaviour of the expert player. Moreover, we show that our agent can generalize to cultural transmission from a human expert, a novel step.
Via ablations, we identify a minimal sufficient set of ingredients for the emergence of cultural transmission, including several elements which were not previously studied in this context, namely an other-agent attention auxiliary loss, within-episode expert dropout and automatic domain randomization.

To train and evaluate their proposed agents, the team designed GoalCycle3D, an open-ended RL environment that offers a diverse task space by virtue of procedural generation, 3D rigid-body physics, and continuous first-person sensorimotor control. Leveraging this task space enables the researchers to explore the cultural transmission of navigational skills, which they chose given the foundational importance of navigation in human and animal cultures. The agents observe their environment via LIDAR sensors and are equipped with Avatar sensors that output the 3-dimensional relative distance of the nearest co-player in the frame of reference of the avatar during training. The Avatars can be controlled by an expert bot or “oracle” that receives privileged information with regard to the correct order of goals to traverse based on simple heuristics. As such, the expert bot alone is not guaranteed to find the most efficient trajectory from one goal to the next.

The proposed agents’ encoded observations are fed into a single-layer recurrent neural network (RNN) with an LSTM core. The output of the LSTM, which the team refers to as the “belief,” is then passed to a policy, value and auxiliary prediction head. The policy and value heads together implement the MPO algorithm, while the prediction head implements the attention loss. Finally, the Avatar sensor observation is used as a prediction target for this loss.

The team trained and tested their agents in procedurally generated 3D worlds containing colourful, spherical goals embedded in a noisy, obstacle-rich terrain. The resulting MEDAL-ADR agents embody what the researchers regard as a minimal sufficient set of representational and experiential biases to enable cultural transmission: memory (M), expert demonstrations (E), dropout (D), an attention loss (AL), and automatic (A) domain randomization (DR).

The results of the team’s empirical experiments validate the proposed MEDAL-ADR agents’ strong within-episode recall, good generalization over a diverse task space, and high-fidelity knowledge transfer. The team hopes their work can pave the way for cultural evolution as an algorithm and the development of more generally intelligent artificial agents.

The paper Learning Robust Real-Time Cultural Transmission Without Human Data is on arXiv.

Author: Hecate He | Editor: Michael Sarazen

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

DeepMind Trains AI Agents Capable of Robust Real-time Cultural Transmission Without Human Data

Like this:

1 comment on “DeepMind Trains AI Agents Capable of Robust Real-time Cultural Transmission Without Human Data”

Leave a Reply Cancel reply

Related

Share this:

Like this:

1 comment on “DeepMind Trains AI Agents Capable of Robust Real-time Cultural Transmission Without Human Data”

Leave a Reply Cancel reply

Related